Unsupervised word sense disambiguation in dynamic semantic spaces

نویسنده

  • Jean-François Delpech
چکیده

In this paper, we are mainly concerned with the ability to quickly and automa cally dis nguish word senses in dynamic seman c spaces in which new terms and new senses appear frequently. Such spaces are built “on the fly” from constantly evolving data sets such as Wikipedia, repositories of patent grants and applica ons, or large sets of legal documents for Technology Assisted Review and e-discovery. This immediacy rules out supervision as well as the use of a priori training sets. We show that the various senses of a term can be automa cally made apparent with a simple clustering algorithm, each sense being a vector in the seman c space. While we only consider here seman c spaces build by using random vectors, this algorithm should work with any kind of embedding, provided meaningful similari es between terms can be computed and do fulfill at least the two basic condi ons that terms which close meanings have high similari es and terms with unrelated meanings have near-zero similari es.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Word Sense Induction from Multiple Semantic Spaces with Locality Sensitive Hashing

Word Sense Disambiguation is the task dedicated to the problem of finding out the sense of a word in context, from all of its many possible senses. Solving this problem requires to know the set of possible senses for a given word, which can be acquired from human knowledge, or from automatic discovery, called Word Sense Induction. In this article, we adapt two existing meta-methods of Word Sens...

متن کامل

Distributional Semantics Approach to Thai Word Sense Disambiguation

Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy...

متن کامل

Kim, Su Nam and Timothy Baldwin (to appear) Word Sense Disambiguation and Noun Compounds, ACM Transactions on Speech and Language Processing

In this paper, we investigate word sense distributions in noun compounds (NCs). Our primary goal is to disambiguate the word sense of component words in NCs, based on investigation of “semantic collocation” between them. We use sense collocation and lexical substitution to build supervised and unsupervised word sense disambiguation (WSD) classifiers, and show our unsupervised learner to be supe...

متن کامل

Utilizing the One-Sense-per-Discourse Constraint for Fully Unsupervised Word Sense Induction and Disambiguation

Recent advances in word sense induction rely on clustering related words. In this paper, instead of using a clustering algorithm, we suggest to perform a Singular Value Decomposition (SVD) which can be guaranteed to always find a global optimum. However, in order to apply this method to the problem of word sense induction, a semantic interpretation of the dimensions computed by the SVD is requi...

متن کامل

UMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness

In this paper we describe an unsupervised WordNet-based Word Sense Disambiguation system, which participated (as UMND1) in the SemEval-2007 Coarsegrained English Lexical Sample task. The system disambiguates a target word by using WordNet-based measures of semantic relatedness to find the sense of the word that is semantically most strongly related to the senses of the words in the context of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.02605  شماره 

صفحات  -

تاریخ انتشار 2018